95 research outputs found

    BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction

    Full text link
    Successful biomedical relation extraction can provide evidence to researchers and clinicians about possible unknown associations between biomedical entities, advancing the current knowledge we have about those entities and their inherent mechanisms. Most biomedical relation extraction systems do not resort to external sources of knowledge, such as domain-specific ontologies. However, using deep learning methods, along with biomedical ontologies, has been recently shown to effectively advance the biomedical relation extraction field. To perform relation extraction, our deep learning system, BiOnt, employs four types of biomedical ontologies, namely, the Gene Ontology, the Human Phenotype Ontology, the Human Disease Ontology, and the Chemical Entities of Biological Interest, regarding gene-products, phenotypes, diseases, and chemical compounds, respectively. We tested our system with three data sets that represent three different types of relations of biomedical entities. BiOnt achieved, in F-score, an improvement of 4.93 percentage points for drug-drug interactions (DDI corpus), 4.99 percentage points for phenotype-gene relations (PGR corpus), and 2.21 percentage points for chemical-induced disease relations (BC5CDR corpus), relatively to the state-of-the-art. The code supporting this system is available at https://github.com/lasigeBioTM/BiOnt.Comment: ECIR 202

    The clinical trial landscape in oncology and connectivity of somatic mutational profiles to targeted therapies

    Full text link
    BACKGROUND: Precision medicine in oncology relies on rapid associations between patient-specific variations and targeted therapeutic efficacy. Due to the advancement of genomic analysis, a vast literature characterizing cancer-associated molecular aberrations and relative therapeutic relevance has been published. However, data are not uniformly reported or readily available, and accessing relevant information in a clinically acceptable time-frame is a daunting proposition, hampering connections between patients and appropriate therapeutic options. One important therapeutic avenue for oncology patients is through clinical trials. Accordingly, a global view into the availability of targeted clinical trials would provide insight into strengths and weaknesses and potentially enable research focus. However, data regarding the landscape of clinical trials in oncology is not readily available, and as a result, a comprehensive understanding of clinical trial availability is difficult. RESULTS: To support clinical decision-making, we have developed a data loader and mapper that connects sequence information from oncology patients to data stored in an in-house database, the JAX Clinical Knowledgebase (JAX-CKB), which can be queried readily to access comprehensive data for clinical reporting via customized reporting queries. JAX-CKB functions as a repository to house expertly curated clinically relevant data surrounding our 358-gene panel, the JAX Cancer Treatment Profile (JAX CTP), and supports annotation of functional significance of molecular variants. Through queries of data housed in JAX-CKB, we have analyzed the landscape of clinical trials relevant to our 358-gene targeted sequencing panel to evaluate strengths and weaknesses in current molecular targeting in oncology. Through this analysis, we have identified patient indications, molecular aberrations, and targeted therapy classes that have strong or weak representation in clinical trials. CONCLUSIONS: Here, we describe the development and disseminate system methods for associating patient genomic sequence data with clinically relevant information, facilitating interpretation and providing a mechanism for informing therapeutic decision-making. Additionally, through customized queries, we have the capability to rapidly analyze the landscape of targeted therapies in clinical trials, enabling a unique view into current therapeutic availability in oncology

    Text Mining the History of Medicine

    Get PDF
    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform

    Using WormBase: A Genome Biology Resource for Caenorhabditis elegans and Related Nematodes

    Get PDF
    WormBase (www.wormbase.org) provides the nematode research community with a centralized database for information pertaining to nematode genes and genomes. As more nematode genome sequences are becoming available and as richer data sets are published, WormBase strives to maintain updated information, displays, and services to facilitate efficient access to and understanding of the knowledge generated by the published nematode genetics literature. This chapter aims to provide an explanation of how to use basic features of WormBase, new features, and some commonly used tools and data queries. Explanations of the curated data and step-by-step instructions of how to access the data via the WormBase website and available data mining tools are provided

    Standardized metadata for human pathogen/vector genomic sequences

    Full text link
    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant
    corecore